On F0 Trajectory Opt for Very High-quality Spee
نویسندگان
چکیده
An optimized fundamental frequency (F0) trajectory extraction method, which alleviates systematic F0 glitches at vowelnasal boundaries and in the vicinity of consonants, is introduced. The proposed method employes minimum phase group delay compensation for apparent F0 modulations due to variations in their corresponding vocal tract transfer functions. This method can also be considered as an implementation of a generalized version of analysis by synthesis. Evaluation using EGG reference signals revealed that the proposed method reduces the systematic biases by 50%.
منابع مشابه
Generating natural F0 trajectory with additive trees
In HMM-based TTS, while the segmental quality of synthesized speech is quite acceptable, intonation, especially at the sentence level, tends to be somewhat bland. The maximum likelihood (ML) criterion used in HMM training and parameter trajectory generation is partially responsible for the blandness. Additionally, the F0 trajectory thus generated has a smaller dynamic range than that of natural...
متن کاملUsing instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis
This paper introduces a general and flexible framework for F0 and aperiodicity (additive non periodic component) analysis, specifically intended for high-quality speech synthesis and modification applications. The proposed framework consists of three subsystems: instantaneous frequency estimator and initial aperiodicity detector, F0 trajectory tracker, and F0 refinement and aperiodicity extract...
متن کاملA new synthesis algorithm using phase information for TTS systems
New speech synthesis algorithms capable of flexible prosody (es pecially F0) modification are desired for a high quality TTS syst em. TD-PSOLA is the most popular synthesis algorithm. The al gorithm shows very high quality when F0 modification is limite d. However, the quality degradation due to pitch epoch detection error becomes severe as the F0 modification factor becomes lar ge. On the othe...
متن کاملMapping Voice to Affect: Japanese listeners
This paper reports the results of perception tests administered to speakers of Japanese as part of a cross-language investigation of how voice quality and f0 combine in the signalling of affect. Three types of synthesised stimuli were presented: (1) ‘VQ only’ involving variations in voice quality and a neutral f0; (2) ‘f0 only’, with different f0 contours and modal voice; and (3) combined ‘VQ +...
متن کاملImproved generation of prosodic features in HMM-based Mandarin speech synthesis
The HMM-based Text-to-Speech System can produce high quality synthetic speech with flexible modeling of spectral and prosodic parameters. However, the prosodic features, like F0 and duration trajectories, generated by HMM-based speech synthesis are often excessively smoothed and lack prosodic variance. In HMM-based TTS durations are typically modeled statistically using state duration probabili...
متن کامل